Data Science: Visualization!)

Section 1: Introduction to Data Visualization and Distributions

You will get started with data visualization and distributions in R.

understand the importance of data visualization for communicating data-driven findings.
be able to use distributions to summarize data.
be able to use the average and the standard deviation to understand the normal distribution.
be able to assess how well a normal distribution fits the data using a quantile-quantile plot.
be able to interpret data from a boxplot.

Section 2: Introduction to ggplot2

You will learn how to use the ggplot2 package to create plots.

Section 3: Summarizing with dplyr

You will learn how to summarize data using the dplyr package.

Section 4: Gapminder

You will see examples of ggplot2 and dplyr in action with the Gapminder dataset.

Section 5: Data Visualization Principles

You will learn general principles to guide you in developing effective data visualizations.

Section 1)

Data Types

Functions Overview:

numeric

Code From Video:

numeric

Key Points:

Categorical data are variables that are defined by a small number of groups.

Ordinal categorical data have an inherent order to the categories (mild/medium/hot, for example).
Non-ordinal categorical data have no order to the categories.

Numerical data take a variety of numeric values.

Continuous variables can take any value.
Discrete variables are limited to sets of specific values.

DataCamp Data Types)

Code:

table() #counts frequency

1.2 Intro to Distributions

DataCamp Assessment: Normal distribution)

Code:

library(dslabs)

data(heights)

x <- heights$height[heights$sex == "Male"]

mean(x>69 & x<=72) #What proportion of the data is between 69 and 72 inches (taller than 69 but shorter or equal to 72)? A proportion is between 0 and 1.